23 research outputs found

    Colossal Trajectory Mining: A unifying approach to mine behavioral mobility patterns

    Get PDF
    Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. In this paper, we introduce Colossal Trajectory Mining (CTM) to efficiently extract heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset

    Conversational OLAP

    Get PDF
    The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we describe COOL, a framework devised for COnversational OLap applications. COOL interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of OLAP operators. The interpretation relies on a formal grammar and on a repository storing metadata and values from a multidimensional cube. In case of ambiguous text description, COOL can obtain the correct query either through automatic inference or user interactions to disambiguate the text

    Cost-based Optimization of Multistore Query Plans

    Get PDF
    Multistores are data management systems that enable query processing across different and heterogeneous databases; besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. Our multistore solution relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by formulating and evaluating different execution plans on the multistore. In particular, we outline different strategies to carry out joins and data fusion by relying on different schema representations; then, a self-learning black-box cost model is used to estimate execution times and select the most efficient plan. The experiments assess the effectiveness of the cost model in choosing the best execution plan for the given queries and exploit multiple multistore benchmarks to investigate the factors that influence the performance of different plans

    Multidimensional integration of RDF datasets

    Get PDF
    Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their integration. However, since each provider has their own data dictionary, identifying common concepts is not trivial and we require costly and complex entity resolution and transformation rules to perform such integration. In this paper, we propose a novel method, that given a set of independent RDF datasets, provides a multidimensional interpretation of these datasets and integrates them based on a common multidimensional space (if any) identified. To do so, our method first identifies potential dimensional and factual data on the input datasets and performs entity resolution to merge common dimensional and factual concepts. As a result, we generate a common multidimensional space and identify each input dataset as a cuboid of the resulting lattice. With such output, we are able to exploit open data with OLAP operators in a richer fashion than dealing with them separately.This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC) program.Peer ReviewedPostprint (author's final draft

    Towards conversational OLAP

    No full text
    The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we envisage a conversational framework specifically devised for OLAP applications. The system converts natural language text in GPSJ (Generalized Projection, Selection and Join) queries. The approach relies on an ad-hoc grammar and a knowledge base storing multidimensional metadata and cubes values. In case of ambiguous or incomplete query description, the system is able to obtain the correct query either through automatic inference or through interactions with the user to disambiguate the text. Our tests show very promising results both in terms of effectiveness and efficiency

    COOL: A framework for conversational OLAP

    No full text
    The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we introduce COOL, a framework devised for COnversational OLap applications. COOL interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of OLAP operators. The interpretation relies on a formal grammar and on a repository storing metadata and values from a multidimensional cube. In case of ambiguous or incomplete text description, COOL can obtain the correct query either through automatic inference or user interactions to disambiguate the text. Our tests show very promising results in terms of effectiveness, efficiency, and user experience. Besides adding novel support to the interpretation and translation of complete analytical OLAP sessions, COOL achieves an average accuracy of 94% in the interpretation of GPSJ queries from real datasets

    Meta-stars: multidimensional modeling for social business intelligence

    No full text
    Social business intelligence is the discipline of combining corporate data with user-generated content (UGC) to let decision-makers improve their business based on the trends perceived from the environment. A key role in the analysis of textual UGC is played by topics, meant as specific concepts of interest within a subject area. To enable aggregations of topics at different levels, a topic hierarchy is to be defined. Some attempts have been made to address some of the peculiarities of topic hierarchies, but no comprehensive solution has been found so far. The approach we propose to model topic hierarchies in ROLAP systems is called meta-stars. Its basic idea is to use meta-modeling coupled with navigation tables and with traditional dimension tables: navigation tables support hierarchy instances with different lengths and with non-leaf facts, and allow different roll-up semantics to be explicitly annotated; meta-modeling enables hierarchy heterogeneity and dynamics to be accommodated; dimension tables are easily integrated with standard business hierarchies. After outlining a reference architecture for social business intelligence and describing the meta-star approach, we discuss its effectiveness and efficiency by showing its querying expressiveness and by presenting some experimental results for query performances

    Business Intelligence and Analytics: On-demand ETL over Document Stores

    No full text
    International audienceFor many decades, Business Intelligence and Analytics (BI&A) has been associated with relational databases. In the era of big data and NoSQL stores, it is important to provide approaches and systems capable of analyzing this type of data for decision-making. In this paper, we present a new BI&A approach that both: (i) extracts, transforms and loads the required data for OLAP analysis (on-demand ETL) from document stores, and (ii) provides the models and the systems required for suitable OLAP analysis. We focus here, on the on-demand ETL stage where, unlike existing works, we consider the dispersion of data over two or more collections

    Answering GPSJ queries in a polystore: A dataspace-based approach

    No full text
    The discipline of data science is steering analysts away from traditional data warehousing and towards a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper, we propose an approach to support data analysis within a polystore supporting relational, document and column data models by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying databases. The expressiveness we enable corresponds to GPSJ queries, which are the most common class of queries in OLAP applications. We rely on Nested Relational Algebra to define a cross-database execution plan. The plan is composed of several local plans, to be executed on the distinct databases, and a global plan, which combines and possibly aggregates inter-database data. The system has been prototyped on Apache Spark

    DART: De-Anonymization of personal gazetteers through social trajectories

    No full text
    The interest in trajectory data has sensibly increased since the widespread of mobile devices. Simple clustering techniques allow the recognition of personal gazetteers, i.e., the set of main points of interest (also called stay points) of each user, together with the list of time instants of each visit. Due to their sensitiveness, personal gazetteers are usually anonymized, but their inherent unique patterns expose them to the risk of being de-anonymized. In particular, social trajectories (i.e., those obtained from social networks, which associate statuses and check-ins to spatial and temporal locations) can be leveraged by an adversary to de-anonymize personal gazetteers. In this paper, we propose DART as an innovative approach to effectively de-anonymize personal gazetteers through social trajectories, even in the absence of a temporal alignment between the two sources (i.e., they have been collected over different periods). DART relies on a big data implementation, guaranteeing the scalability to large volumes of data. We evaluate our approach on two real-world datasets and we compare it with recent state-of-the-art algorithms to verify its effectiveness
    corecore